{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# Exporting Categorical Data of a Time Series as Binary Tables using pandas get_dummies\n",
    "In this tutorial, we will learn how to use the `get_dummies` function in the pandas library to export categorical data of a time series as binary tables.\n",
    "## Prerequisites\n",
    "\n",
    "Before we begin, make sure you have the following libraries installed:\n",
    "\n",
    "    pandas: a library for data manipulation and analysis in Python\n",
    "    numpy: a library for scientific computing in Python\n",
    "\n",
    "You can install these libraries using pip:\n",
    "````\n",
    "pip install pandas numpy\n",
    "````\n",
    "\n",
    "> Both libraries are installed in the A-SOiD environment, so you do not have to install it again."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "\n",
    "## Step 1: Import the Required Libraries\n",
    "\n",
    "First, let's import the required libraries:\n"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Step 2: Load the Data\n",
    "\n",
    "Next, let's load the data into a pandas DataFrame. For this tutorial, we will use a synthetic dataset generated using numpy, but you can use any dataset that has categorical data in it."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "         time category\n",
      "0  2022-01-01        A\n",
      "1  2022-01-02        B\n",
      "2  2022-01-03        A\n",
      "3  2022-01-04        B\n",
      "4  2022-01-05        B\n",
      "5  2022-01-06        C\n",
      "6  2022-01-07        A\n",
      "7  2022-01-08        C\n",
      "8  2022-01-09        A\n",
      "9  2022-01-10        A\n",
      "10 2022-01-11        A\n",
      "11 2022-01-12        C\n",
      "12 2022-01-13        B\n",
      "13 2022-01-14        C\n",
      "14 2022-01-15        C\n",
      "15 2022-01-16        A\n",
      "16 2022-01-17        B\n",
      "17 2022-01-18        B\n",
      "18 2022-01-19        B\n",
      "19 2022-01-20        B\n",
      "20 2022-01-21        A\n",
      "21 2022-01-22        B\n",
      "22 2022-01-23        A\n",
      "23 2022-01-24        A\n",
      "24 2022-01-25        B\n",
      "25 2022-01-26        C\n",
      "26 2022-01-27        A\n",
      "27 2022-01-28        C\n",
      "28 2022-01-29        A\n",
      "29 2022-01-30        B\n",
      "30 2022-01-31        B\n"
     ]
    }
   ],
   "source": [
    "\n",
    "### Generate a synthetic dataset with categorical data\n",
    "np.random.seed(0)\n",
    "df = pd.DataFrame({\n",
    "    'time': pd.date_range('2022-01-01', '2022-01-31'),\n",
    "    'category': np.random.choice(['A', 'B', 'C'], size=31)\n",
    "})\n",
    "\n",
    "print(df)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "If you already have data in a csv format, you can load it into pandas like this:"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "outputs": [],
   "source": [
    "df = pd.read_csv(\"PATH/TO/DATA.csv\", index_col= 0)\n",
    "print(df)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "\n",
    "## Step 3: Use `get_dummies` to Create Binary Tables\n",
    "\n",
    "Now, let's use the `get_dummies` function to create binary tables for the categorical data in our dataset."
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "    A  B  C\n",
      "0   1  0  0\n",
      "1   0  1  0\n",
      "2   1  0  0\n",
      "3   0  1  0\n",
      "4   0  1  0\n",
      "5   0  0  1\n",
      "6   1  0  0\n",
      "7   0  0  1\n",
      "8   1  0  0\n",
      "9   1  0  0\n",
      "10  1  0  0\n",
      "11  0  0  1\n",
      "12  0  1  0\n",
      "13  0  0  1\n",
      "14  0  0  1\n",
      "15  1  0  0\n",
      "16  0  1  0\n",
      "17  0  1  0\n",
      "18  0  1  0\n",
      "19  0  1  0\n",
      "20  1  0  0\n",
      "21  0  1  0\n",
      "22  1  0  0\n",
      "23  1  0  0\n",
      "24  0  1  0\n",
      "25  0  0  1\n",
      "26  1  0  0\n",
      "27  0  0  1\n",
      "28  1  0  0\n",
      "29  0  1  0\n",
      "30  0  1  0\n"
     ]
    }
   ],
   "source": [
    "### Create binary tables using get_dummies\n",
    "\n",
    "dummies = pd.get_dummies(df['category'])\n",
    "\n",
    "print(dummies)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Step 4: Convert index to timesteps"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "      A  B  C\n",
      "time         \n",
      "0.0   1  0  0\n",
      "0.1   0  1  0\n",
      "0.2   1  0  0\n",
      "0.3   0  1  0\n",
      "0.4   0  1  0\n",
      "0.5   0  0  1\n",
      "0.6   1  0  0\n",
      "0.7   0  0  1\n",
      "0.8   1  0  0\n",
      "0.9   1  0  0\n",
      "1.0   1  0  0\n",
      "1.1   0  0  1\n",
      "1.2   0  1  0\n",
      "1.3   0  0  1\n",
      "1.4   0  0  1\n",
      "1.5   1  0  0\n",
      "1.6   0  1  0\n",
      "1.7   0  1  0\n",
      "1.8   0  1  0\n",
      "1.9   0  1  0\n",
      "2.0   1  0  0\n",
      "2.1   0  1  0\n",
      "2.2   1  0  0\n",
      "2.3   1  0  0\n",
      "2.4   0  1  0\n",
      "2.5   0  0  1\n",
      "2.6   1  0  0\n",
      "2.7   0  0  1\n",
      "2.8   1  0  0\n",
      "2.9   0  1  0\n",
      "3.0   0  1  0\n"
     ]
    }
   ],
   "source": [
    "# Convert the index into an index with 0.1 steps as A-SOiD expects timesteps e.g. in 10 Hz\n",
    "\n",
    "dummies['time'] = dummies.index * 0.1\n",
    "dummies = dummies.set_index('time')\n",
    "\n",
    "print(dummies)"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "\n",
    "## Step 5: Save the binary table as a csv file\n",
    "\n",
    "To save a pandas DataFrame as a CSV (comma-separated values) file, you can use the `to_csv` function. Here is an example of how to use it:"
   ],
   "metadata": {
    "collapsed": false
   }
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "outputs": [],
   "source": [
    "# Save the DataFrame as a CSV file\n",
    "dummies.to_csv('dummies.csv', index = True)"
   ],
   "metadata": {
    "collapsed": false
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
